Distributed statistical inference for massive data
نویسندگان
چکیده
This paper considers distributed statistical inference for general symmetric statistics in the context of massive data with efficient computation. Estimation efficiency and asymptotic distributions are provided, which reveal different results between nondegenerate degenerate cases, show number subsets plays an important role. Two bootstrap methods proposed analyzed to approximation underlying distribution improved computation over existing methods. The accuracy distributional by studied theoretically. One methods, pseudo-distributed bootstrap, is particularly attractive if datasets large as it directly resamples subset-based statistics, assumes less stringent conditions its performance can be studentization.
منابع مشابه
Distributed Sampling Storage for Statistical Analysis of Massive Sensor Data
Cyber-physical systems interconnect the cyber world with the physical world in which sensors are massively networked to monitor the physical world. Various services are expected to be able to use sensor data reflecting the physical world with information technology. Given this expectation, it is important to simultaneously provide timely access to massive data and reduce storage costs. We propo...
متن کاملstatistical inference via empirical bayes approach for stationary and dynamic contingency tables
چکیده ندارد.
15 صفحه اولScalable Algorithms for Distributed Statistical Inference
The classical framework on distributed inference considers a set of nodes taking measurements and a fusion center making the final decision on the underlying phenomenon, without dealing with the issue of transporting the measurements to the fusion center. Such an approach introduces significant overhead in communication. Communicating all the raw data for inference is not scalable: in this case...
متن کاملApproximated Bayesian Inference for Massive Streaming Data
Extracting meaningful information out of massive streaming data is a significant challenge due to the high dimensionality of the inference problem and limits on available computational power and memory. While Bayesian models often convey significant inferential advantages, standard computational algorithms relying on Markov chain Monte Carlo are infeasible to apply. This motivates online variat...
متن کاملCommunication-Efficient Distributed Statistical Inference
We present a Communication-efficient Surrogate Likelihood (CSL) framework for solving distributed statistical inference problems. CSL provides a communication-efficient surrogate to the global likelihood that can be used for low-dimensional estimation, high-dimensional regularized estimation and Bayesian inference. For low-dimensional estimation, CSL provably improves upon naive averaging schem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Annals of Statistics
سال: 2021
ISSN: ['0090-5364', '2168-8966']
DOI: https://doi.org/10.1214/21-aos2062